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Freedman [Adv. in Appl. Math. 40 (2008) 180-193; Ann. Appl. 
Stat. 2 (2008) 176-196] critiqued ordinary least squares regression ad- 
justment of estimated treatment effects in randomized experiments, 
using Neyman's model for randomization inference. Contrary to con- 
ventional wisdom, he argued that adjustment can lead to worsened 
asymptotic precision, invalid measures of precision, and small-sample 
bias. This paper shows that in sufficiently large samples, those prob- 
lems are either minor or easily fixed. OLS adjustment cannot hurt 
asymptotic precision when a full set of treatment-covariate interac- 
tions is included. Asymptotically valid confidence intervals can be 
constructed with the Huber- White sandwich standard error estima- 
tor. Checks on the asymptotic approximations are illustrated with 
data from Angrist, Lang, and Oreopoulos's [Am. Econ. J.: Appl. 
Econ. 1:1 (2009) 136-163] evaluation of strategies to improve college 
students' achievement. The strongest reasons to support Freedman's 
preference for unadjusted estimates are transparency and the dangers 
of specification search. 



1. Introduction. One of the attractions of randomized experiments is 
that, ideally, the strength of the design reduces the need for statistical mod- 
eling. Simple comparisons of means can be used to estimate the average 
effects of assigning subjects to treatment. Nevertheless, many researchers 
use linear regression models to adjust for random differences between the 
baseline characteristics of the treatment groups. The usual rationale is that 
adjustment tends to improve precision if the sample is large enough and the 
covariates are correlated with the outcome; this argument, which assumes 
that the regression model is correct, stems from Fisher (1932) and is taught 
to applied researchers in many fields. At research firms that conduct ran- 
domized experiments to evaluate social programs, adjustment is standard 
practice. 1 
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Cochran (1957), Cox and McCullagh (1982), Raudenbush (1997), and Klar and Dar- 
lington (2004) discuss precision improvement. Greenberg and Shroder (2004) document 
the use of regression adjustment in many randomized social experiments. 
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In an important and influential critique, Freedman (2008ab) analyzes the 
behavior of ordinary least squares regression-adjusted estimates without as- 
suming a regression model. He uses Neyman's (1923) model for randomiza- 
tion inference: treatment effects can vary across subjects, linearity is not 
assumed, and random assignment is the source of variability in estimated 
average treatment effects. Freedman shows that (i) adjustment can actually 
worsen asymptotic precision, (ii) the conventional OLS standard error esti- 
mator is inconsistent, and (hi) the adjusted treatment effect estimator has a 
small-sample bias. He writes [Freedman (2008a)], "The reason for the break- 
down is not hard to find: randomization does not justify the assumptions 
behind the OLS model." 

This paper offers an alternative perspective. Although I agree with Freed- 
man's (2008b) general advice ( "Regression estimates . . . should be deferred 
until rates and averages have been presented"), I argue that in sufficiently 
large samples, the statistical problems he raised are either minor or eas- 
ily fixed. Under the Neyman model with Freedman's regularity conditions, 
I show that (i) OLS adjustment cannot hurt asymptotic precision when 
a full set of treatment x covariate interactions is included, and (ii) the 
Huber-White sandwich standard error estimator is consistent or asymptot- 
ically conservative (regardless of whether the interactions are included). I 
also briefly discuss the small-sample bias issue and the distinction between 
unconditional and conditional unbiasedness. 

Even the traditional OLS adjustment has benign large-sample properties 
when subjects are randomly assigned to two groups of equal size. Freedman 
(2008a) shows that in this case, adjustment (without interactions) improves 
or does not hurt asymptotic precision, and the conventional standard error 
estimator is consistent or asymptotically conservative. However, Freedman 
and many excellent applied statisticians in the social sciences have summa- 
rized his papers in terms that omit these results and emphasize the dangers 
of adjustment. For example, Berk et al. (2010) write: "Random assignment 
does not justify any form of regression with covariates. If regression adjust- 
ments are introduced nevertheless, there is likely to be bias in any estimates 
of treatment effects and badly biased standard errors." 

One aim of this paper is to show that such a negative view is not always 
warranted. A second aim is to help provide a more intuitive understanding 
of the properties of OLS adjustment when the regression model is incorrect. 
An "agnostic" view of regression [Angrist and Imbens (2002); Angrist and 
Pischke (2009, ch. 3)] is adopted here: without taking the regression model 
literally, we can still make use of properties of OLS that do not depend on 
the model assumptions. 
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1.1. Precedents. Similar results on the asymptotic precision of OLS ad- 
justment with interactions are proved in interesting and useful papers by 
Yang and Tsiatis (2001), Tsiatis et al. (2008), and Schochet (2010), un- 
der the assumption that the subjects are a random sample from an infinite 
superpopulation. 2 These results are not widely known, and Freedman was 
apparently unaware of them. He did not analyze adjustment with interac- 
tions, but conjectured, "Treatment by covariate interactions can probably 
be accommodated too" [Freedman (2008b, p. 186)]. 

Like Freedman, I use the Neyman model, in which random assignment 
of a finite population is the sole source of randomness; for a thoughtful 
philosophical discussion of finite- vs. infinite-population inference, see Re- 
ichardt and Gollob (1999, pp. 125-127). My purpose is not to advocate 
finite-population inference, but to show just how little needs to be changed 
to address Freedman's major concerns. The results may help researchers un- 
derstand why and when OLS adjustment can backfire. In large samples, the 
essential problem is omission of treatment x covariate interactions, not the 
linear model. With a balanced two-group design, even that problem disap- 
pears asymptotically, because two wrongs make a right (under adjustment of 
one group mean cancels out over adjustment of the other). 

Neglected parallels between regression adjustment in experiments and 
regression estimators in survey sampling turn out to be very helpful for 
intuition. 

2. Basic framework. For simplicity, the main results in this paper 
assume a completely randomized experiment with two treatment groups (or 
a treatment group and a control group), as in Freedman (2008a). Results 
for designs with more than two groups are discussed informally. 

2.1. The Neyman model with covariates. The notation is adapted from 
Freedman (2008b). There are n subjects, indexed by i = 1, . . . , n. We assign a 
simple random sample of fixed size ua to treatment A and the remaining n — 
ua subjects to treatment B. For each subject, we observe an outcome Y{ and 
a row vector of covariates Zj = (zn, . . . , zik), where 1 < K < min(n J 4, n — 
ua) — 1- Treatment does not affect the covariates. 

Assume that each subject has two potential outcomes [Neyman (1923); 
Rubin (1974, 2005); Holland (1986)], a, and bi, which would be observed 

2 Although Tsiatis et al. write that OLS adjustment without interactions "is generally 
more precise than . . . the difference in sample means" (p. 4661), Yang and Tsiatis's asymp- 
totic variance formula correctly implies that this adjustment may help or hurt precision. 
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under treatments A and B, respectively. 3 Thus, the observed outcome is 
Yi = a(T{ + — Tj), where Tj is a dummy variable for treatment A. 

Random assignment is the sole source of randomness in this model. The n 
subjects are the population of interest; they are not assumed to be randomly 
drawn from a superpopulation. For each subject, a^, 6j, and Zj are fixed, but 
Ti and thus Yi are random. 

Let a, cla, and as denote the means of ctj over the population, treatment 
group A, and treatment group B: 

1 n 1 1 

a=-YV, a A = — y^Oj, a B = ^2, a i- 

n , riA ■» n — riA r~L 

Use similar notation for the means of hi, Yi, Zj, and other variables. 

Our goal is to estimate the average treatment effect of A relative to B: 

ATE =a — b. 



2.2. Estimators of average treatment effect. The unadjusted or difference- 
in- means estimator of ATE is 

ATEunafl = Y a-Y B = a A -~b B . 

The usual OLS-adjusted estimator of ATE is the estimated coefficient on 
Ti in the OLS regression of Yi on Tj and Zj. (All regressions described in this 
paper include intercepts.) Let ATE^ denote this estimator. 

A third estimator, ATEi ntcvSLC t, can be computed as the estimated coef- 
ficient on Ti in the OLS regression of Yi on Ti, Zj, and Ti (zj — z) . Section 
3 motivates this estimator by analogy with regression estimators in survey 
sampling. In the context of observational studies, Imbens and Wooldridge 
(2009, pp. 28-30) give a theoretical analysis of ATEi ntCTSliCt , and a related 
method is known as the Peters-Belson or Oaxaca-Blinder estimator. 4 When 
Zj is a set of indicators for the values of a categorical variable, A T£^i nteract 
is equivalent to subclassification or poststratification [Miratrix, Sekhon, and 
Yu (2012)]. 

3 Most authors use notation such as Yi(l) and ii(0), or Yu and Yoi, for potential out- 
comes. Freedman's (2008b) choice of en and bi helps make the finite-population asymptotics 
more readable. 

4 See Cochran (1969), Rubin (1984), and Kline (2011). Hansen and Bowers (2009) ana- 
lyze a randomized experiment with a variant of the Peters-Belson estimator derived from 
logistic regression. 
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3. Connections with sampling. Cochran (1977, ch. 7) gives a very 
readable discussion of regression estimators in sampling. 5 In one example 
[Watson (1937)], the goal was to estimate y, the average surface area of the 
leaves on a plant. Measuring a leaf's area is time-consuming, but its weight 
can be found quickly. So the researcher weighed all the leaves, but measured 
area for only a small sample. In simple random sampling, the sample mean 
area y s is an unbiased estimator of y. But y s ignores the auxiliary data on 
leaf weights. The sample and population mean weights (zg an d z) are both 
known, and if z > z$, then we expect that y > y s . This motivates a "linear 
regression estimator" 

(3-1) § rcg = y s + q(z ~ zs) 

where q is an adjustment factor. One way to choose q is to regress leaf area 
on leaf weight in the sample. 

Regression adjustment in randomized experiments can be motivated anal- 
ogously under the Neyman model. The potential outcome a, is measured for 
only a simple random sample (treatment group A), but the covariates Zj 
are measured for the whole population (the n subjects). The sample mean 
a a is an unbiased estimator of a, but it ignores the auxiliary data on Zj. If 
the covariates are of some help in predicting m, then another estimator to 
consider is 

(3.2) a rcg = a A + (z- z A )q a 

where q a is a K x 1 vector of adjustment factors. Similarly, we can consider 
using 

(3.3) b TCg = b B + (z- z B )qb 

to estimate b and then a reg — b reg to estimate ATE =a — b. 

The analogy suggests deriving q a and q& from OLS regressions of aj on 
Zj in treatment group A and bi on Zj in treatment group B — in other words, 
separate regressions of Yi on Zj in the two treatment groups. The estimator 
flreg — &reg is then just A TE- mtcvSLCt . If, instead, we use a pooled regression of 
Yi on T{ and Zj to derive a single vector q a = q&, then we get ATE^. 

Connections between regression adjustment in experiments and regression 
estimators in sampling have been noted but remain underexplored. 6 All three 

5 See also Fuller (2002, 2009). 

Connections are noted by Fienberg and Tanur (1987), Hansen and Bowers (2009), and 
Middleton and Aronow (2012) but are not mentioned by Cochran despite his important 
contributions to both literatures. He takes a design-based (agnostic) approach in much of 
his work on sampling, but assumes a regression model in his classic overview of regression 
adjustment in experiments and observational studies [Cochran (1957)]. 
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of the issues that Preedman raised have parallels in the sampling literature. 
Under simple random sampling, when the regression model is incorrect, OLS 
adjustment of the estimated mean still improves or does not hurt asymptotic 
precision [Cochran (1977)], consistent standard error estimators are available 
[Puller (1975)], and the adjusted estimator of the mean has a small-sample 
bias [Cochran (1942)]. 

4. Asymptotic precision. 

4.1. Precision improvement in sampling. This subsection gives an infor- 
mal argument, adapted from Cochran (1977), to show that in simple random 
sampling, OLS adjustment of the sample mean improves or does not hurt 
asymptotic precision, even when the regression model is incorrect. Regu- 
larity conditions and other technical details are omitted; the purpose is to 
motivate the results on completely randomized experiments in Section 4.2. 

First imagine using a "fixed-slope" regression estimator, where q in Eq. 
(3.1) is fixed at some value go before sampling: 

Vf =ys + Qo(z-z s )- 

If qo = 0,yj is just y s . More generally, y/ is the sample mean of yi—qo(zi—z), 
so its variance follows the usual formula with a finite-population correction: 

\ N — n 1 1 2 

var ( y f) = iV^Tn iV ^ ~ V) ~ % ( Zl ~ Z)] 

i=i 

where N is the population size and n is the sample size. 

Thus, choosing go to minimize the variance of yj is equivalent to run- 
ning an OLS regression of yi on z% in the population. The solution is the 
"population least squares" slope, 

_ Silite -z)(vi -v) 

and the minimum-variance fixed-slope regression estimator is 

§PLS = VS + <7PLS(^ ~ZS). 

Since the sample mean y s is a fixed-slope regression estimator, it follows 
that ypLs h as lower variance than the sample mean, unless gpLS = (in 
which case y PLS =Vs)- 
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The actual OLS regression estimator is almost as precise as ?/pls i n suf- 
ficiently large samples. The difference between the two estimators is 

Vols ~ Vpls = (Qols ~ <?pls)(^ - z s ) 

where 5ols is the estimated slope from a regression of y% on z% in the sample. 
The estimation errors jfoLS ~~ 9pls , zs — z, and I7pls — V are of order 1 / \fn 
in probability. Thus, the difference Vols ~ Vpls ^ s °f order 1/n, which is 
negligible compared to the estimation error in yp L g when n is large enough. 
In sum, in large enough samples, 

var (Vols) ~ v »r (f PL s) < v »r (y<?) 

and the inequality is strict unless yi and Z\ are uncorrelated in the popula- 
tion. 

4.2. Precision improvement in experiments. The sampling result natu- 
rally leads to the conjecture that in a completely randomized experiment, 
OLS adjustment with a full set of treatment x covariate interactions im- 
proves or does not hurt asymptotic precision, even when the regression model 
is incorrect. The adjusted estimator ATEi nteTa , c t is just the difference between 
two OLS regression estimators from sampling theory, while ATE una dj is the 
difference between two sample means. 

The conjecture is confirmed below. To summarize the results: 

1. ^Ti^nteract is consistent and asymptotically normal (as are ATE mm ^ 
and ATE a( ±j, from Freedman's results). 

2. Asymptotically, ^4T£j nteract is at least as efficient as ATE una ^j, and 
more efficient unless the covariates are uncorrelated with the weighted 
average 

n-n A n A 

di H bi. 

n n 

3. Asymptotically, A TE- mtciact is at least as efficient as ATE a ^, and more 
efficient unless (a) the two treatment groups have equal size or (b) the 
covariates are uncorrelated with the treatment effect ai — bi. 

4.2.1. Assumptions for asymptotics. Finite-population asymptotic results 
are statements about randomized experiments on (or random samples from) 
an imaginary infinite sequence of finite populations, with increasing n. The 
regularity conditions (assumptions on the limiting behavior of the sequence) 
may seem vacuous, since one can always construct a sequence that contains 
the actual population and still satisfies the conditions. But it may be useful 
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to ask whether a sequence that preserves any relevant "irregularities" (such 
as the influence of gross outliers) would violate the regularity conditions. 
See also Lumley (2010, pp. 217-218). 

The asymptotic results in this paper assume Freedman's (2008b) regular- 
ity conditions, generalized to allow multiple covariates; the number of co- 
variates K is constant as n grows. One practical interpretation of these con- 
ditions is that in order for the results to be applicable, the size of each treat- 
ment group should be sufficiently large (and much larger than the number 
of covariates), the influence of outliers should be small, and near-collinearity 
in the covariates should be avoided. 

As Freedman (2008a) notes, in principle, there should be an extra sub- 
script to index the sequence of populations: for example, in the population 
with n subjects, the ith subject has potential outcomes aj jn and 6j jn , and 
the average treatment effect is ATE n . Like Freedman, I drop the extra sub- 
scripts. 

Condition 1. There is a bound L < oo such that for all n = 1, 2, . . . 
and k = 1, . . . , K, 

n 1 n 1 n 

-j2 a t<L, -j2 b t< L > -J2 z tk< L - 

n n . , n r—r 

i=i i=i %=i 

Condition 2. Let Z be the n x [K + 1) matrix whose ith row is (1, Zj). 
Then n _1 Z'Z converges to a finite, invertible matrix. Also, the population 
means of aj, 6j, af, bf, a^j, ajZj, and 6jZj converge to finite limits. For 
example, linin^oo n~ l J2f=i a i z i exists and is a finite vector. 

Condition 3. The proportion n^/n converges to a limit pa, with < 
PA < 1. 

4.2.2. Asymptotic results. Let Q a denote the limit of the vector of slope 
coefficients in the population least squares regression of a,, on Zj. That is, 



Qa = hm 



n— >oo 




'(zj-z) ^2(zi - z)'(ai - a) 



i=l 



Define analogously. 

Now define the prediction errors 

a* = (a, - a) - (zj - z)Q , b* = (hi - b) - (zj - z)Q fe 



for i = 1, . . . , n. 
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For any variables %% and jji , let and a X:V denote the population variance 
of Xi and the population covariance of x% and j/j. For example, 

■y n 1 n 

= - E K* - (5 - F ) = - E °? 6 i ■ 

i=l i=l 

Theorem 1 and its corollaries are proved in the Appendix. 

Theorem 1. Assume Conditions 1-3. Then y/n(ATE inteTSUCt — ATE) con- 
verges in distribution to a Gaussian random variable with mean and vari- 
ance 

1 ~PA v 2 PA ,. 2 , o v 

lim u a * + lim a b * + 2 lim a a * . 



p A n->oo 1 — Pa "—>-oo 



n—>oo 



Corollary 1.1. Assume Conditions 1-3. Then ATE uri ^ has at least 
as much asymptotic variance as ATE- mtciact . The difference is 

1 2 
lim a E 



np A {l ~Pa) n ->°° 

where E{ = (zj — z)Qe andQ E = (1 — PA)Qa+PAQ,b- Therefore, adjustment 
with ATEi nteiSuCt helps asymptotic precision i/Q_e 7^ and is neutral ?/Qb = 
0. 

Remarks, (i) Qe can be thought of as a weighted average of Q a and 
Qb, or as the limit of the vector of slope coefficients in the population least 
squares regression of (1 — pa)<H on Zj. 

(ii) The weights may seem counterintuitive at first, but the sampling 
analogy and Eqs. (3.2-3.3) can help. Other things being equal, adjustment 
has a larger effect on the estimated mean from the smaller treatment group, 
because its mean covariate values are further away from the population 
mean. The adjustment added to Ha is 

(z - Z A )Q,a = ~ — (Z£ - Z A )Qa 

n 

while the adjustment added to T>b is 

(z - z B )Qb = (zb - za)Q&, 

n 

where Q a and are OLS estimates that converge to Q a and Qf,. 

(iii) If the covariates' associations with ai and bi go in opposite directions, 
it is possible for adjustment with ATEi ntcract to have no effect on asymptotic 
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precision. Specifically, if (1 — PA)Qa = ~PAQb, the adjustments to cla and 
&B tend to cancel each other out. 

(iv) In designs with more than two treatment groups, estimators analo- 
gous to ylTEintcract can be derived from a separate regression in each treat- 
ment group, or equivalently a single regression with the appropriate treat- 
ment dummies, covariates, and interactions. The resulting estimator of (for 
example) a — b is at least as efficient as Y a — Yb, and more efficient unless 
the covariates are uncorrelated with both m and b{. The Appendix gives a 
proof. 

Corollary 1.2. Assume Conditions 1-3. Then ATE a ^ has at least as 
much asymptotic variance as ATE- mter£lct . The difference is 

(2PA ~ I) 2 2 

7- r hm a D 

np A {l - pa) n ^°° 

where Di = (zj — z)(Q a — Qb). Therefore, the two estimators have equal 
asymptotic precision if pa = 1/2 or Q a = Qj. Otherwise, ATEi nterauCt is 
asymptotically more efficient. 

Remarks, (i) Q a — Q& is the limit of the vector of slope coefficients in 
the population least squares regression of the treatment effect ai — hi on Zj. 

(ii) For intuition about the behavior of ATE^j, suppose there is a single 
covariate, Zi, and the population least squares slopes are Q a = 10 and Qb = 
2. Let Q denote the estimated coefficient on Zi from a pooled OLS regression 
of Yi on Ti and z$. In sufficiently large samples, Q tends to fall close to 
PaQo, + (1 —PA)Qb- Consider two cases: 

• If the two treatment groups have equal size, then ~z — zb = —(z — Za), 
so when z — za = 1, the ideal linear adjustment would add 10 to cla 
and subtract 2 from bs- Instead, ATE^ uses the pooled slope estimate 
Q w 6, so it tends to underadjust Ha (adding about 6) and overadjust 
bs (subtracting about 6). Two wrongs make a right: the adjustment 
adds about 12 to a a — &b, just as ATE- mtcxSkCt would have done. 

• If group A is 9 times larger than group B, then z — ~zb = — 9(z — za), 
so when z — za = 1, the ideal linear adjustment adds 10 to Ha and 
subtracts 9 • 2 = 18 from frg, thus adding 28 to the estimate of ATE. 
In contrast, the pooled adjustment adds Q ps 9.2 to a a and subtracts 
9Q ps 82.8 from bs, thus adding about 92 to the estimate of ATE. 
The problem is that the pooled regression has more observations of a% 
than of bi, but the adjustment has a larger effect on the estimate of 
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b than on that of a, since group -B's mean covariate value is further 
away from the population mean. 

(iii) The example above suggests an alternative regression adjustment: 
when group A has nine-tenths of the subjects, give group B nine-tenths of 
the weight. More generally, let pa = n^/n. Run a weighted least squares 
regression of Y{ on Tj and Zj, with weights of (1 — pa)/pa on each obser- 
vation from group A and — pa) on each observation from group B. 
This "tyranny of the minority" estimator is asymptotically equivalent to 
ylT-Eintcract (the Appendix outlines a proof). It is equal to ATE & ^ when 
PA = 1/2. 

(iv) The tyranny estimator can also be seen as a one-step variant of Rubin 
and van der Laan's (2011) two-step "targeted ANCOVA." Their estimator 
is equivalent to the difference in means of the residuals from a weighted least 
squares regression of Yi on Zj, with the same weights as in remark (iii). 

(v) When is the usual adjustment worse than no adjustment? Eq. (23) in 
Freedman (2008a) implies that with a single covariate Zi, for ATE^ to have 
higher asymptotic variance than ATE una dj, a necessary (but not sufficient) 
condition is that either the design must be so imbalanced that more than 
three-quarters of the subjects are assigned to one group, or Zi must have a 
larger covariance with the treatment effect — hi than with the expected 
outcome pA&i + (1 — PA)h- With multiple covariates, a similar condition can 
be derived from Eq. (14) in Schochet (2010). 

(vi) With more than two treatment groups, the usual adjustment can 
be worse than no adjustment even when the design is balanced [Freedman 
(2008b)]. All the groups are pooled in a single regression without treatment 
x covariate interactions, so group -B's data can affect the contrast between 
A and C. 

4.2.3. Example. This simulation illustrates some of the key ideas. 

1. For n = 1,000 subjects, a covariate %i was drawn from the uniform 
distribution on [—4,4]. The potential outcomes were then generated 
as 

exp(zi) + exp(zj/2) 
a-i = ~. V ^, 

-exp(zi) + exp(zi/2) 



with Vi and £j drawn independently from the standard normal distri- 
bution. 
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Table 1 

Simulation (1,000 subjects; 40,000 replications) 



Estimator Proportion assigned to treatment A 

0.75 0.6 0.5 0.4 0.25 



sD (asymptotic) x 1,000 












Unadjusted 


93 


49 


52 


78 


143 


usual vjijo-aQjusteo 


1 7 1 
1(1 


i Z. 




70 


loU 


OLS with interaction 


80 


49 


46 


58 


98 


Tyranny of the minority 


80 


49 


46 


58 


08 


SD (empirical) x 1,000 












Unadjusted 


93 


49 


53 


78 


142 


Usual OLS-adjusted 


171 


73 


47 


80 


180 


OLS with interaction 


81 


50 


47 


50 


00 


Tyranny of the minority 


81 


50 


47 


50 


00 


Bias (estimated) x 1,000 












Unadjusted 














-2 


Usual OLS-adjusted 


-3 


-3 


-3 


-3 


-5 


OLS with interaction 


-5 


-3 


-3, 


-4 


-6 


Tyranny of the minority 


-5 


-3 


-3 


-4 


-6 



2. A completely randomized experiment was simulated 40,000 times, as- 
signing riA = 750 subjects to treatment A and the remainder to treat- 
ment B. 

3. Step 2 was repeated for four other values of ua (600, 500, 400, and 
250). 

These are adverse conditions for regression adjustment: z% covaries much 
more with the treatment effect — hi than with the potential outcomes, 
and the population least squares slopes Q a = 1.06 and Qb = —0.73 are of 
opposite signs. 

Table 1 compares ATE una dj, ATE a( ij, ATEi ntcra _ ct , and the "tyranny of the 
minority" estimator from remark (hi) after Corollary 1.2. The first panel 
shows the asymptotic standard errors derived from Freedman's (2008b) The- 
orems 1 and 2 and this paper's Theorem 1 (with limits replaced by actual 
population values). The second and third panels show the empirical standard 
deviations and bias estimates from the Monte Carlo simulation. 

The empirical standard deviations are very close to the asymptotic pre- 
dictions, and the estimated biases are small in comparison. The usual ad- 
justment hurts precision except when n^/ji = 0.5. In contrast, ATE^temct 
and the tyranny estimator improve precision except when n^/n = 0.6. [This 
is approximately the value of pa where ^ITEjnteract and ATE unax ij have equal 
asymptotic variance; see remark (hi) after Corollary 1.1.] 



REGRESSION ADJUSTMENT 



13 



Randomization does not "justify" the regression model of ATEi ntcrajCt , 
and the linearity assumption is far from accurate in this example, but the 
estimator solves Freedman's asymptotic precision problem. 

5. Variance estimation. Eicker (1967) and White (1980ab) proposed 
a covariance matrix estimator for OLS that is consistent under simple ran- 
dom sampling from an infinite population. The regression model assump- 
tions, such as linearity and homoskedasticity, are not needed for this result. 7 
The estimator is 

(X / X)- 1 X'diag(e?, . . . , et)X(X'X)- 1 

where X is the matrix of regressors and is the ith OLS residual. It is 
known as the sandwich estimator because of its form, or as the Huber- 
White estimator because it is the sample analog of Huber's (1967) formula 
for the asymptotic variance of a maximum likelihood estimator when the 
model is incorrect. 

Theorem 2 shows that under the Neyman model, the sandwich variance 
estimators for ATE a dj and ATEi ntcrac t are consistent or asymptotically con- 
servative. Together, Theorems 1 and 2 in this paper and Theorem 2 in 
Freedman (2008b) imply that asymptotically valid confidence intervals for 
^4T£'can be constructed from either ATE a ^ or ATi?i nteract and the sandwich 
standard error estimator. 

The vectors Q a and were defined in Section 4.2.2. Let Q denote the 
weighted average pAQ a + (l — PA)Qb- As shown in Freedman (2008b) and the 
Appendix to this paper, Q is the probability limit of the vector of estimated 
coefficients on Zj in the OLS regression of Yi on Tj and Zj. 

Mimicking Section 4.2.2, define the prediction errors 

a** = (di - a) - (zj - z)Q, b** = (k - b) - (zj - z)Q 

for i = 1, . . . , n. 

Theorem 2 is proved in the Appendix. 

Theorem 2. Assume Conditions 1-3. Let v^j o,nd ^interact denote the 
sandwich variance estimators for ATE & ^ and ATEi ntcract . Then nv^dj con- 
verges in probability to 

— lim <t?** H lim cr?»», 

Pa n ~ 1 — Pa n ~ 



7 See, e.g., Chamberlain (1982, pp. 17-19) or Angrist and Pischke (2009, pp. 40- 
Fuller (1975) proves a finite-population version of the result. 
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which is greater than or equal to the true asymptotic variance of y/n(ATEg^y 
ATE). The difference is 



1 n 



lim al b) = lim - V[(oi - k) - ATE] 2 . 



n— >oo ( a ^) n— >oo n 

i=l 



Similarly, ^interact converges in probability to 

— lim a 2 , + lim of, , 

Pa n ~*°° 1 — pa n^too 

which is greater than or equal to the true asymptotic variance of ^/n(ATE- mtcract - 
ATE). The difference is 

1 n 

lim ctL^ = lim - V[(oi - ^) - ATE - (z; - z)(Q Q - Q fe )] 2 . 

i=l 

Remarks, (i) Theorem 2 generalizes to designs with more than two 
treatment groups. 

(ii) With two treatment groups of equal size, the conventional OLS vari- 
ance estimator for ATE^ is also consistent or asymptotically conservative 
[Preedman (2008a)]. 

(iii) Freedman (2008a) shows analogous results for variance estimators 
for the difference in means; the issue there is whether to assume o~ 2 = a 2 . 
Reichardt and Gollob (1999) and Freedman, Pisani, and Purves (2007, pp. 
508-511) give helpful expositions of basic results under the Neyman model. 
Related issues appear in discussions of the two-sample problem [Miller (1986, 
pp. 56-62); Stonehouse and Forrester (1998)] and randomization tests [Gail 
et al. (1996); Chung and Romano (2011ab)]. 

(iv) With a small sample or points of high leverage, the sandwich esti- 
mator can have substantial downward bias and high variability. MacKin- 
non (2013) discusses bias-corrected sandwich estimators and improved con- 
fidence intervals based on the wild bootstrap. See also Wu (1986), Tibshirani 
(1986), Angrist and Pischke (2009, ch. 8), and Kline and Santos (2012). 

(v) When ATE uaax ±- s is computed by regressing Yi on Tj, the HC2 bias- 
corrected sandwich estimator [MacKinnon and White (1985); Royall and 
Cumberland (1978); Wu (1986, p. 1274)] gives exactly the variance estimate 
preferred by Neyman (1923) and Freedman (2008a): d 2 /nA + & 2 /(n — ha), 
where a 2 and a? are the sample variances of Yi in the two groups. 8 



8 For details, see Hinkley and Wang (1991), Angrist and Pischke (2009, pp. 294-304), 
or Samii and Aronow (2012). 
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(vi) When the n subjects are randomly drawn from a superpopulation, 
"^interact does not take into account the variability in z [Imbens and Wooldridge 
(2009, pp. 28-30)]. In the Neyman model, z is fixed. 

(vii) Freedman's (2006) critique of the sandwich estimator does not apply 
here, as ATE^ and AT3 mteract are consistent even when their regression 
models are incorrect. 

(viii) Freedman (2008a) associates the difference in means and regres- 
sion with heteroskedasticity-robust and conventional variance estimators, 
respectively. His rationale for these pairings is unclear. The pooled-variance 
two-sample i-test and the conventional -F-test for equality of means are of- 
ten used in difference-in- means analyses. Conversely, the sandwich estimator 
has become the usual variance estimator for regression in economics [Stock 
(2010)]. The question of whether to adjust for covariates should be disen- 
tangled from the question of whether to assume homoskedasticity. 

6. Bias. The bias of OLS adjustment diminishes rapidly with the num- 
ber of randomly assigned units: ATE^ and ATEi nterilci have biases of order 
1/n, while their standard errors are of order 1/y/n. Brief remarks follow; see 
also Deaton (2010, pp. 443-444), Imbens (2010, pp. 410-411), and Green 
and Aronow (2011). 

(i) If the actual random assignment yields substantial covariate imbalance, 
it is hardly reassuring to be told that the difference in means is unbiased over 
all possible random assignments. Senn (1989) and Cox and Reid (2000, pp. 
29-32) argue that inference should be conditional on a measure of covariate 
imbalance, and that the conditional bias of ATE UQ£L ^ justifies adjustment. 
Tukey (1991) suggests adjustment "perhaps as a supplemental analysis" for 
"protection against either the consequences of inadequate randomization or 
the (random) occurrence of an unusual randomization." 

(ii) As noted in Section 2.2, poststratification is a special case of ^ITEinteract- 
The poststratified estimator is a population-weighted average of subgroup- 
specific differences in means. Conditional on the numbers of subgroup mem- 
bers assigned to each treatment, the poststratified estimator is unbiased, but 
ATE una dj can be biased. Miratrix, Sekhon, and Yu (2012) give finite-sample 
and asymptotic analyses of poststratification and blocking; see also Holt and 
Smith (1979) in the sampling context. 

(iii) Cochran (1977) analyzes the bias of y vcg in Eq. (3.1). If the adjustment 
factor q is fixed, y reg is unbiased, but if q varies with the sample, y reg has a 
bias of — cov(q,zs)- The leading term in the bias of y G LS is 
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where n is the sample size, N is the population size, and ej is the prediction 
error in the population least squares regression of yi on z%. 

(iv) By analogy, the leading term in the bias of ATEi ntcract (with a single 
covariate Zi) is 



Thus, the bias tends to depend largely on n, n^/n, and the importance of 
omitted quadratic terms in the regressions of cti and bi on z%. With multiple 
covariates, it would also depend on the importance of omitted first-order 
interactions between the covariates. 

(v) Remark (iii) also implies that if the adjustment factors q a and q^ in 
Eqs. (3.2-3.3) do not vary with random assignment, the resulting estimator 
of ATE is unbiased. Middleton and Aronow's (2012) insightful paper uses 
out-of-sample data to determine q a = q&. In-sample data can be used when 
multiple pretests (pre-randomization outcome measures) are available: if the 
only covariate Zi is the most recent pretest, a common adjustment factor 
Qa = Qb can be determined by regressing Z{ on an earlier pretest. 

7. Empirical example. This section suggests empirical checks on the 
asymptotic approximations. I will focus on the validity of confidence inter- 
vals, using data from a social experiment for an illustrative example. 

7.1. Background. Angrist, Lang, and Oreopoulos (2009; henceforth ALO) 
conducted an experiment to estimate the effects of support services and fi- 
nancial incentives on college students' academic achievement. At a Cana- 
dian university campus, all first-year undergraduates entering in September 
2005, except those with a high-school grade point average (GPA) in the top 
quartile, were randomly assigned to four groups. One treatment group was 
offered support services (peer advising and supplemental instruction). An- 
other group was offered financial incentives (awards of $1,000 to $5,000 for 
meeting a target GPA). A third group was offered both services and incen- 
tives. The control group was eligible only for standard university support 
services (which included supplemental instruction for some courses). 

ALO report that for women, the combination of services and incentives 
had sizable estimated effects on both first- and second-year academic achieve- 
ment, even though the programs were only offered during the first year. In 
contrast, there was no evidence that services alone or incentives alone had 
lasting effects for women or that any of the treatments improved achieve- 
ment for men (who were much less likely to contact peer advisors). 




1 " / 
lim - Va*(zj - z) 2 - 

i=l v 



n — riA 



1 



, , n 

lim _y;&?(*-z) 

7 i=\ 
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To simplify the example and focus on the accuracy of large-sample approx- 
imations in samples that are not huge, I use only the data for men (43 percent 
of the students) in the services-and-incentives and services-only groups (9 
percent and 15 percent of the men). First-year GPA data are available for 58 
men in the services-and-incentives group and 99 in the services-only group. 

Table 2 shows alternative estimates of ATE (the average treatment effect 
of the financial incentives, given that the support services were available). 
The services-and-incentives and services-only groups had average first-year 
GPAs of 1.82 and 1.86 (on a scale of to 4), so the unadjusted estimate of 
ATE is close to zero. OLS adjustment for high-school GPA hardly makes 
a practical difference to either the point estimate of ATE or the sandwich 
standard error estimate, regardless of whether the treatment x covariate 
interaction is included. 9 The two groups had similar average high-school 
GPAs, and high-school GPA was not a strong predictor of first-year college 
GPA. 



Table 2 

Estimates of average treatment effect on men's first-year GPA 





Point estimate 


Sandwich SE 


Unadjusted 


-0.036 


0.158 


Usual OLS-adjusted 


-0.083 


0.146 


OLS with interaction 


-0.081 


0.146 



The finding that adjustment appears to have little effect on precision 
is not unusual in social experiments, because the covariates are often only 
weakly correlated with the outcome [Meyer (1995, pp. 100, 116); Lin et al. 
(1998, pp. 129-133)]. Examining eight social experiments with a wide range 
of outcome variables, Schochet (2010) finds R 2 values above 0.3 only when 
the outcome is a standardized achievement test score or Medicaid costs and 
the covariates include a lagged outcome. 

Researchers may prefer not to adjust when the expected precision im- 
provement is meager. Either way, confidence intervals for treatment effects 
typically rely on either strong parametric assumptions (such as a constant 
treatment effect or a normally distributed outcome) or asymptotic approxi- 
mations. When a sandwich standard error estimate is multiplied by 1.96 to 
form a margin of error for a 95 percent confidence interval, the calculation 
assumes the sample is large enough that (i) the estimator of ATE is approx- 
imately normally distributed, (ii) the bias and variability of the sandwich 
standard error estimator are small relative to the true standard error (or 

9 ALO adjust for a larger set of covariates, including first language, parents' education, 
and self-reported procrastination tendencies. These also have little effect on the estimated 
standard errors. 
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else the bias is conservative and the variability is small), and (iii) the bias 
of adjustment (if used) is small relative to the true standard error. 

Below I discuss a simulation to check for confidence interval undercoverage 
due to violations of (i) or (ii), and a bias estimate to check for violations of 
(iii). These checks are not foolproof, but may provide a useful sniff test. 

7.2. Simulation. For technical reasons, the most revealing initial check 
is a simulation with a constant treatment effect. When treatment effects are 
heterogeneous, the sandwich standard error estimators for ATE una dj and 
ATE a dj are asymptotically conservative, 10 so nominal 95 percent confidence 
intervals for ATE achieve greater than 95 percent coverage in large enough 
samples. A simulation that overstates treatment effect heterogeneity may 
overestimate coverage. 

Table 3 reports a simulation that assumes treatment had no effect on any 
of the men. Keeping the GPA data at their actual values, I replicated the 
experiment 250,000 times, each time randomly assigning 58 men to services- 
and-incentives and 99 to services-only. The first panel shows the means and 
standard deviations of ^T-Eunadj, ATE^, and ATE mtcraiCt . All three estima- 
tors are approximately unbiased, but adjustment slightly improves precision. 
Since the simulation assumes a constant treatment effect (zero), including 
the treatment x covariate interaction does not improve precision relative to 
the usual adjustment. 

The second and third panels show the estimated biases and standard de- 
viations of the sandwich standard error estimator and the three variants 
discussed in Angrist and Pischke (2009, pp. 294-308). ALO's paper uses 
HC1 [Hinkley (1977)], which simply multiplies the sandwich variance esti- 
mator by n/(n — k), where k is the number of regressors. HC2 [see remark 
(v) after Theorem 2] and the approximate jackknife HC3 [Davidson and 
MacKinnon (1993); Tibshirani (1986)] inflate the squared residuals in the 
sandwich formula by the factors (1 — ho) -1 and (1 — hn)~ 2 , where ha is the 
zth diagonal element of the hat matrix X(X'X) _1 X'. All the standard error 
estimators appear to be approximately unbiased with low variability. 

The fourth and fifth panels evaluate thirteen ways of constructing a 95 
percent confidence interval. For each of the three estimators of ATE, each of 
the four standard error estimators was multiplied by 1.96 to form the margin 
of error for a normal-approximation interval. Welch's (1949) t-interval [Miller 
(1986, pp. 60-62)] was also constructed. Welch's interval uses ATE un ^, 

10 By Theorem 2, the sandwich standard error estimator for -ATiSnteract is also asymp- 
totically conservative unless the treatment effect is a linear function of the covariates. 
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Table 3 

Simulation with zero treatment effect (250,000 replications) . The fourth panel shows the 
empirical coverage rates of nominal 95 percent confidence intervals. All other estimates 

are on the four-point GPA scale. 



ATE estimator 

Unadjusted Usual OLS-adjusted OLS with interaction 



ijias &£ ou oi J\±iii estimator 








Mean (estimated bias) 


a nnn 
0.000 


0.000 


n AAA 

0.000 


on 


0.158 


0.147 


0.147 


Bins or SE estimator 








Classic sandwich 


-0.001 


-0.002 


-0.002 


HC1 


0.000 


0.000 


0.000 


WHO 


n nnn 

U.UUU 


n nnn 

U.UUU 


n nnn 

U.UUU 




0.001 


0.002 


0.002 


hJJ or z>kt estimator 








Classic sandwich 


n nn/i 
U.UU4 


n nn/i 
U.UU4 


n c\c\A 
U.UU4 




o nnzt 


o nnd 


n 004 


HC2 


0.004 


0.004 


0.004 


HC3 


0.004 


0.004 


0.005 


CI coverage (percent) 








Classic sandwich 


94.6 


94.5 


94.4 


HC1 


94.8 


94.7 


94.7 


HC2 (normal) 


94.8 


94.8 


94.8 


HC2 (Welch t) 


95.1 






HC3 


95.0 


95.0 


95.1 


CI width (average) 








Classic sandwich 


0.618 


0.570 


0.568 


HC1 


0.622 


0.576 


0.575 


HC2 (normal) 


0.622 


0.576 


0.577 


HC2 (Welch t) 


0.629 






HC3 


0.627 


0.583 


0.586 



the HC2 standard error estimator, and the t-distribution with the Welch- 
Satterthwaite approximate degrees of freedom. 

The fourth panel shows that all thirteen confidence intervals cover the 
true value of ATE (zero) with approximately 95 percent probability. The 
fifth panel shows the average widths of the intervals. (The mean and median 
widths agree up to three decimal places.) The regression-adjusted intervals 
are narrower on average than the unadjusted intervals, but the improve- 
ment is meager. In sum, adjustment appears to yield slightly more precise 
inference without sacrificing validity. 
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7.3. Bias estimates. One limitation of the simulation above is that the 
bias of adjustment may be larger when treatment effects are heterogeneous. 
With a single covariate Z{, the leading term in the bias of ATE^ is 11 



11 1 

■ . lim - V[(oi - bi) - ATE]( Zi - z 

z i=i 



Thus, with a constant treatment effect, the leading term is zero (and the bias 
is of order n~ 3 / 2 or smaller). Preedman (2008b) shows that with a balanced 
design and a constant treatment effect, the bias is exactly zero. 
We can estimate the leading term by rewriting it as 



1 1 

n a 2 



2 n 1 n 

lim - YVcii - a)(zi - z) 2 - lim - VY&i - b)(zi - z) 2 

i=l 1=1 



and substituting the sample variance of high-school GPA for a 2 , and the 
sample covariances of first-year college GPA with the square of centered 
high-school GPA in the services-and-incentives and services-only groups for 
the bracketed limits. The resulting estimate of the bias of ATE^ is —0.0002 
on the four-point GPA scale. Similarly, the leading term in the bias of 
A T£i nterac t [Section 6, remark (iv)] can be estimated, and the result is also 
—0.0002. The biases would need to be orders of magnitude larger to have 
noticeable effects on confidence interval coverage (the estimated standard 
errors of ATE^ and ATE[ Q t erac t m Table 2 are both 0.146). 

7.4. Remarks, (i) This exercise does not prove that the bias of adjust- 
ment is negligible, since it just replaces a first-order approximation (the bias 
is close to zero in large enough samples) with a second-order approximation 
(the bias is close to the leading term in large enough samples), and the es- 
timate of the leading term has sampling error. 12 The checks suggested here 
cannot validate an analysis, but they can reveal problems. 

(ii) Another limitation is that the simulation assumes the potential out- 
come distributions have the same shape. In Stonehouse and Forrester's 
(1998) simulations, Welch's i-test was not robust to extreme skewness in 
the smaller group when that group's sample size was 30 or smaller. That 



11 An equivalent expression appears in the version of Freedman (2008a) on his web page. 
It can be derived from Freedman (2008b) after correcting a minor error in Eqs. (17-18): 
the potential outcomes should be centered. 

12 Finite-population bootstrap methods [Davison and Hinkley (1997, pp. 92-100, 125)] 
may also be useful for estimating the bias of j4T!Si n teract, but similar caveats would apply. 
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does not appear to be a serious issue in this example, however. The distribu- 
tion of men's first-year GPA in the services-and-incentives group is roughly 
symmetric (e.g., see ALO, Fig. 1A). 

(iii) The simulation check may appear to resemble permutation inference 
[Fisher (1935); Tukey (1993); Rosenbaum (2002)], but the goals differ. Here, 
the constant treatment effect scenario just gives a benchmark to check the 
finite-sample coverage of confidence intervals that are asymptotically valid 
under weaker assumptions. Classical permutation methods achieve exact in- 
ference under strong assumptions about treatment effects, but may give mis- 
leading results when the assumptions fail. For example, the Fisher-Pitman 
permutation test is asymptotically equivalent to a t-test using the conven- 
tional OLS standard error estimator. The test can be inverted to give exact 
confidence intervals for a constant treatment effect, but these intervals may 
undercover ATE when treatment effects are heterogeneous and the design is 
imbalanced [Gail et al. (1996)]. 

(iv) Chung and Romano (201 lab) discuss and extend a literature on per- 
mutation tests that do remain valid asymptotically when the null hypothesis 
is weakened. One such test is based on the permutation distribution of a 
heteroskedasticity-robust t-statistic. Exploration of this approach under the 
Neyman model (with and without covariate adjustment) would be valuable. 

8. Further remarks. Freedman's papers answer important questions 
about the properties of OLS adjustment. He and others have summarized 
his results with a "glass is half empty" view that highlights the dangers of 
adjustment. To the extent that this view encourages researchers to present 
unadjusted estimates first, it is probably a good influence. The difference in 
means is the "hands above the table" estimate: it is clearly not the product 
of a specification search, and its transparency may encourage discussion of 
the strengths and weaknesses of the data and research design. 13 

But it would be unwise to conclude that Freedman's critique should al- 
ways override the arguments for adjustment, or that studies reporting only 
adjusted estimates should always be distrusted. Freedman's own work shows 
that with large enough samples and balanced two- group designs, random- 
ization justifies the traditional adjustment. One does not need to believe in 
the classical linear model to tolerate or even advocate OLS adjustment, just 
as one does not need to believe in the Four Noble Truths of Buddhism to 
entertain the hypothesis that mindfulness meditation has causal effects on 
mental health. 



13 On transparency and critical discussion, see Ashenfelter and Plant (1990), Freedman 
(1991, 2008c, 2010), Moher et al. (2010), and Rosenbaum (2010, ch. 6). 
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From an agnostic perspective, Freedman's theorems are a major contri- 
bution. Three-quarters of a century after Fisher discovered the analysis of 
covariance, Freedman deepened our understanding of its properties by de- 
riving the regression-adjusted estimator's asymptotic distribution without 
assuming a regression model, a constant treatment effect, or an infinite su- 
perpopulation. His argument is constructed with unsurpassed clarity and 
rigor. It deserves to be studied in detail and considered carefully. 

TECHNICAL APPENDIX 
A.l. Additional notation and definitions. In the main paper: 

• Section 2 defines the basic notation; 

• Section 4.2.1 states Conditions 1-3; 

• Section 4.2.2 defines the vectors Q a and Qb and the prediction errors 
a* and b*, and introduces the a 2 x and a x>y notation for population 
variances and covariances; 

• Section 5 defines the vector Q and the prediction errors a** and b**. 

Let Pa = nA/n [as in remark (iii) after Corollary 1.2]. 
Extend Section 2's notation for population and group means to cover any 
scalar, vector, or matrix expression. For example: 

abA = — y^aibi, oza = — Y^a^, z'z A = — V z^Zj. 

A i&A A 2SA A igA 

Extend Freedman's (2008b) angle bracket notation to cover all the finite 
limits assumed in Condition 2. For example: 

■y n 1 n 

(az) = lim — > cijZi, (z'z) = lim — > z'-Zj. 

x ' n->oo n x ' n— >oo fl — ' 

i=l i=l 

(The second limit exists since it is a submatrix of lim n ^oo ?i _1 Z'Z.) 

Condition 4 (centering) will sometimes be assumed for convenience. The 
proofs will explain why this can be done without loss of generality. 

Condition 4. The population means of the potential outcomes and the 
covariates are zero: a = b = and z = 0. 

Some transformations of the regressors will be useful in the proofs. Define 
the pooled-slopes regression estimator of mean potential outcomes, (3 a dj, as 
the 2x1 vector containing the estimated coefficients on Tj and 1 — Tj from 
the no- intercept OLS regression of Yi on Tj, 1 — Tj, and Zj — z. Let Q denote 
the vector of estimated coefficients on z, - z from the same regression. 
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The vector f3 a dj is an estimate of (3 = (a,b)'. By well-known invariance 
properties of least squares, ATE a dj is the difference between the two ele- 
ments of Padj- 

Similarly, define the separate-slopes regression estimator of mean poten- 
tial outcomes, ^interact-, as the 2x1 vector containing the estimated coef- 
ficients on Ti and 1 — Tj from the no-intercept OLS regression of Yi on Tj, 
1 — T, z,, — z, and Tj(zj — z). Then ATEi nteract is the difference between the 
two elements oi Pinter act ■ 

Let Q a and Q;, denote the vectors of estimated coefficients on Zj in the 
OLS regressions of Yj on Zj in groups A and B, respectively. 

Conditions 1-3 do not rule out the possibility that under some realizations 
of random assignment, the regressors are perfectly collinear. The probability 
of this event converges to zero by Conditions 2 and 3, so it is irrelevant 
to the asymptotic results. For concreteness, whenever ATE a( ij cannot be 
computed because of collinearity, let ATE a( ij = Y a~Y b, Q = 0, and f3 a dj = 
(Xa,Yb)'', whenever ATE interact cannot be computed, let ATE interact = 
Y a ~ Y B , Q a = 0, Q b = 0, and interact = {Y a,Y b)' ■ Other arbitrary 
values could be used. 

A. 2. Lemmas. Lemma 1 is a finite-population version of the Weak Law 
of Large Numbers. 

Lemma 1. Assume Conditions 1-3. The means over group A or group 
B of cii, bi, Zi, of, bf, z^Zj, ciibi, djZj, and b^i converge in probability to the 
limits of the population means. For example: 

a A 

a 2 A = — a i 

ab A 
az A 
z'z A 

Proof. From basic results on simple random sampling [e.g., Freedman's 
(2008b) Proposition 1], E(a A ) = a and 

/ — \ 1 1-W 2 

var(a j4 = -— : a a . 

n-1 p A 









(a), 








(a 2 ) 




(ab) 






A 


(az) 






A 


(z'z 



As n — > oo, pa — > PA > and a 2 a — > (a 2 ) — (a) 2 , so var(ayi) — > 0. By 
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P 

Chebyshev's inequality, a a — « - > 0. Therefore, 

Ha lim a = (a). 

n— >oo 

The proofs that a 2 a (o 2 ) and abA A- (ab) are similar but rely on 
Condition 1 to show that var(a 2 A) — > and var(a&A) — > 0. First note that 



and 



var(a 2 ^) 



var(a&A) 



1 1 — PA 2 

-V) 



n - 1 pa 



1 1 ~ PA 2 

-°"(ab)- 



n - 1 pa 



By Condition 1, o" 2 a2 ^ is bounded: 



a 2 fl2) < a 4 < L. 



Therefore, var(a 2 A) — > 0. Next note that <r 2 a ^ is bounded, using the Cauchy- 
Schwarz inequality: 



i=i \ i=i 



1/2 



1 



n 



i=l 



1/2 



< L. 



Therefore, var(a^A) — > 0. 

The same logic can be used to show the remaining results. Those involving 
Zj can be proved element by element. □ 

Lemma 2. The pooled-slopes estimator of mean potential outcomes is 



ft 



adj 



Y A - (z A - z)Q, Yb ~ (z B - z)Q 



Proof. The residuals from the regression defining ft ac ij are uncorrelated 
with Ti and 1 — Tj. Therefore, the regression line passes through the points 
of means within groups A and B, and the result follows. □ 

Lemma 3. The separate- slopes estimator of mean potential outcomes is 



ft 



interact 



Y a ~ (za - z)Qa, Y b - (z B - z)Q t 
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Proof. In the regression denning ^interact-, the coefficient on Zj — z is 
and the coefficient on Tj(zj — z) is Q a — Qj,. (This can be shown from the 
equivalence of the minimization problems.) The rest of the proof is similar 
to that of Lemma 2. □ 

Lemma 4. Assume Conditions 1-3. Then Q A Q. 

Proof. We can assume Condition 4 without loss of generality: Let 7 be 
the estimated coefficient vector from a no-intercept OLS regression of Yi on 
Ti, 1 — Ti, and Zj — z. Let Oj = a^ — a and 6j = 6j — b, so that Condition 4 holds 
for di and 6j. Let Yi = a,{Ti + — Tj). By a well-known property of OLS 
[e.g., Freedman's (2008b) Lemma A.l], the estimated coefficient vector from 
a no-intercept OLS regression of Yi on Ti, 1 — Tj, and Zj — z is 7 — (a, 6, 0)', 
so Q is unchanged. Similarly, Q is unchanged. Finally, centering Zj has no 
effect on the slope vectors Q and Q. 

By the Frisch-Waugh-Lovell theorem, Q can be computed from auxiliary 
regressions: Let 

Yi — YaTi — Yb(1 — Ti), 
Zj - z A Ti -z B (l - Ti). 



z'z - paz'a z a ~ (1 - pa)z'b z b- 

P — P 

By Condition 4 and Lemma 1, z~a — > and z~b — > 0. Therefore, 

1 n 

A (z'z). 

77, r— f 

i=i 

Now note that 



Then 

Q = 

Some algebra yields 

1 n 

n . 1 
i=i 



e% = (ai - a A )Ti + (bi - b B )(l - Ti), 
ii = (zi-z A )Ti + (zi-z B )(l-Ti). 
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Therefore, 



r? 



- Y](zj - ZA)'(a% - (ia) + - E( z * ~ z B)'(bi - b B ) 

= pa(oza ~ o-a^a)' + (1 - PA)(bz B - b B z B )' 
A p A (az)' + (l-p A )(bz)'. 

(Convergence to the last expression follows from Lemma 1 and Conditions 
3-4.) 

It follows that 



Q A {z'z)- l [p A {az)' + {l- PA ){bz)'} 



PA lim 



E z H E z :< 



i=l 



+ (1 - Pa) lim 



E z * z * E z 'i bi 



u=l 



1=1 



P^Qa + (1 - PA)Qb = Q 



□ 



Lemma 5. Assume Conditions 1-3. Then Q a A Q a and Qb ^ Qb- 



Proof. The proof is similar to that of Lemma 4 but simpler. Again, we 
can assume Condition 4 without loss of generality. By the Frisch-Waugh- 
Lovell theorem, 



Qa 



1 



i -1 r 



E( Z » ~~ z A)'(zi - Z A ) 



n A 



i<=A 



1 



— }Xzi - ZA)'(ai - a a) 



n A 



ieA 



Some algebra, Lemma 1, and Condition 4 yield 

— E( z * ~ z A)\zi - z A ) = z'z A - z A z A ^ (z'z) 



and 



so 



n A 



ieA 



— V(zj - z A )'(ai - a A ) = (az A - a A ZA)' ^ (az)' 



Q a A (z'z) 1 {az)' 



lim 

n— ¥oo 



E z i z E z ^ ai 



1=1 



Qa 



The proof that Q;, — > Qb is similar. 



□ 
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Lemma 6 is similar to part of Freedman's (2008b) Theorem 2. 
Lemma 6. Assume Conditions 1-3. Then 
V^G9o*--/3)4jV(0,V) 

where 



V 



lim n -»oo a a ** — lim^-^oo a a **^* 



Proof. We can assume Condition 4 without loss of generality: Centering 
Oj, 6j, and z, has no effect on Q and Q, as shown in the proof of Lemma 4, 
so it subtracts (a, 6)' from both /9 a( w (see Lemma 2) and (3, and it has no 
effect on the elements of V. 

Condition 4 and Lemma 2 imply that 

V^0adj-P) = V^(Ya-zaQ,Y b -z b CD' 

= y/n(a A - zaQ,~o~b ~ zbQ)' - [y/nz A (Q - Q), ^z B (Q - Q)f. 

By a finite-population Central Limit Theorem [Freedman's (2008b) The- 
orem 1], y/nz A and \/nz B are O p (l), and by Lemma 4, Q — Q is o p (l). 
Therefore, 

[y^z A (Q - Q), ^z B (Q - Q)]' 4 0. 

The conclusion follows from Freedman's (2008b) Theorem 1 with a and 6 
replaced by a — zQ and b — zQ. □ 

Lemma 7 is an application of the Weak Law of Large Numbers (Lemma 

!)• 

Lemma 7. Assume Conditions 1-3. Let be any K x 1 vector that is 
constant as n — > oo. Then 

1 1 n 

— y^{a l +z l 6) 2 4 lim -V^+z.fl) 2 , 

1 1 n 
5>,+ Zi 0) 2 4 lim -^(k+ZiO) 2 . 



28 



W. LIN 



Proof. Using Lemma 1, 



1 



J2 (fli + ZiO) 2 = a 2 A + 2az A + 0'z'z A 



A (a 2 ) + 2(az)6 + 6'(z'z)6 
1 n 

lim - y^faj + Zi6) 2 . 



n-yoo n r- 



The proof of the other assertion is analogous. 



□ 



Lemma 8 shows that the sandwich variance estimator for ATE a dj is in- 
variant to the transformation of the regressors that was used to define ac [j . 



Lemma 



Let 
W 



(X'X)- 1 hTe^X; (X'X)- 1 



where X is the n x (K + 2) matrix with row i equal to Xj = (Tj , 1 — T , Zj — z) 
and ii is the residual from the no-intercept OLS regression ofYi onxj. Then 
v a dj = Wii + W22 — 2W12, where Wij is the element of W. 

Proof. By definition, u a( « is the (2,2) element of 

(X'X)- 1 X'diag(e1, . . . , e^)X(X'X)- 1 = (X'X)" 1 efx'jXjj (X'X)- 1 

where X is the n x {K + 2) matrix whose ith row is Xj = (1, Tj, Zj) and 6j is 
the residual from the OLS regression of Y{ on Xj. 

The OLS residuals are invariant to the linear transformation of regressors, 
so ej = £j for i = 1, 2, . . . , n. Also, X = XRS where 

R 



and 



" M 





s = 


I 2 L 





Ik 








M 



" 1 


1 " 




" 1 


1 







1 -1 



1 -1 



z 




Note that R is symmetric but S is not, and 

I 2 — L 
Ik 
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Therefore, 

(X , X)' 1 X / diag(e?,...,et)X(X / X)' 1 = S^R^WR-^S" 1 )'. 
The (2, 2) element is W n + W 22 - 2W i2 . □ 
Lemma 9 is important for the proof of Theorem 2. 

Lemma 9. Assume Conditions 1~4- Let e-i denote the residual from the 
no-intercept OLS regression ofYi on Tj , 1 — Ti, and Zj . Then 

— e. 2 A lim of,,, ef A lim of**, 

riA n ->°° n - ua r-t 

and n _1 SieA ^ z «> n ~ X J2ieB z i> an d n_1 Ya=\ ^i z 'i z i are o,HO p (l). 

Proof. Let (3 a dj(i) an d 0adj(2) denote the estimated coefficients on and 
1 — Ti, respectively. Then 

h = Yi — P ad j(i)Ti - P a dj{2) (1 _ T i) - Z «Q 

= Ti[{ai - ZjQ) - /3 a4? -(i)] + (1 - Ti)[(bi - z;Q) - /3 adi{2 )] 
= Ti[aT - Zi(Q - Q) - fl^i)] + (1 - TOK* - z;(Q - Q) - ^ (2) ]. 
Therefore, 

1 



E^ 2 = — E[«r-^(Q-Q)-/3, 

Si + S 2 + S3 — 25*4 — 2S5 — 2S[ 



adj(l)} 2 



where 



S 2 = (Q-Q)Vza(Q-Q), 

ad/(l)> 



5 3 = A 2 



5 4 = ( — E<* z (Q-Q)> 

5*5 = /3ad?(l)a**A, 

Se = Padj(l)ZA(Q - Q) 

Si A- lim n ^. 0O of*, by Lemma 7 and Condition 4. 
The other terms are all o p (l): 
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• S2 A because Q A Q (by Lemma 4) and z'z^ A (z'z) (by Lemma 
!)• 



53 — > because 

54 A because 



p. — 



f(i) 



a = (by Condition 4 and Lemma 6). 



n A 



^(a; - Q'z-)zj 



(az) - Q'(z'z) 



(by Lemma 1) and Q A Q. 

£5 A because a** a A (a) — (z)Q = (by Lemma 1 and Condition 
4) and j3 adj{l) A 0. 

Sq A because z^ A (by Lemma 1 and Condition 4), P a dj(i) ~^ 0, 
and Q A Q. 



Therefore, 



Similarly, 



Now note that 



— & A lim a% 



n — riA 



Eef A lim al**. 



n 



11 



i<=A 



Ri + R2 + R-3 ~~ 2R4 — 2R5 — 2Rr 



where 



R1 



V 



E 

2GA 



Q/A Za, 



R 2 = ± YVziQ) 2 *, 
n r— f 



R 3 
R4 

R 5 

R6 



Q'- VajZ-Zj, 

PAAjdi(l)OZA, 
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R3, R5, and R.6 are o p (l) because @ a dj(i) 0, A 0, and pa, azA, z'za, 
and Q converge to finite limits (by Condition 3, Lemma 1, and Lemma 4). 

Ri, R2, and R4 are O p (l), by Condition 1, Lemma 4, and repeated ap- 
plication of the Cauchy-Schwarz inequality. For example, for k = 1, . . . , K, 
the kth element of R2 is 



1 I K ~ V K K (~ ~ 1 

^ E E tijQj z ^ = EE QQi- E 

ieA \j=i J j=ie=i \ ieA 

Qj and Q t are O p (l), and rT x J2ieA z ij z it z ik is 0(1): 

-i n / -1 n \ 1/2 / 



n 



^ -El%'H^«l - _ E 



i=l 



i=l 



1/4 /x 



1/2 



n 



1/4 



1/4 



~ US 



1/4 



^E 1 UE- ^ E 4 



i=l 



i=l 



i=l 



Therefore, R2 is O p {\). 

Thus, n~ 1 J2i<=A ^i z i 18 O p (l). The proofs for n~ l J2ieB &i z i an d n~ l Ya=i &i z i z i 
are similar. □ 

A. 3. Proof of Theorem 1. We can assume Condition 4 without loss 
of generality, by an argument similar to that given in the proof of Lemma 
6. Then ATE = 0, and by Lemma 3 and Condition 4, 

y/n(ATE inter act - ATE) = y/n[(a A - zaQo.) - (pB - ZfiQfe)] 

= \Jn[(a A - zaQo.) ~ (pB ~ ZfiQft)] - 

Vnz A (Q a - Qo) + Vnz B (Qb ~ Qb)- 

By a finite-population Central Limit Theorem [Freedman's (2008b) The- 
orem 1], \/nzA and y/nzs are O p (l), and by Lemma 5, Q a — Q a and — Q b 
are o p (l). Therefore, \fnz A (Q a — Qa) and ^/t£z.b(Q& ~ Qb) are o p (l). 

The conclusion follows from Freedman's (2008b) Theorem 1 with a and b 
replaced by a — zQ a and b — zQb. 



A. 4. Proof of Corollary 1.1. We can assume Condition 4 without loss 
of generality: Centering Oj, 6j, and Zj has no effect on AT Ei n t erac t — ATE, 
ATE unadj - ATE, Q a , Q b , or a\. 
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Note that: 

1 n 

lim al* = lim - VVflj - ZjQ a ) 2 

n— >oo n->oo n ' — ' 

i=l 

= (a 2 ) - (az)(z / z)- 1 (az) / , 
lim <r fe 2 * = (b 2 ) - (6z)(z / z)^ 1 (6z) / , 

n— >-oo 



1 n 

lim cr a ». 6 . = lim - V(ai - ZjQ a )(6j - ZjQ b ) 

n— >oo n— >oo 



= (ab) - (az)Q fe - (6z)Q a + Q^z'z^ 
= (ab) - (az)(z'z)" 1 (6z) / . 

By Freedman's (2008b) Theorem 1, 

max(y/n[ATE unad j - ATE}) = avar(-v/n[aA - b B }) 

= ±^PA {a 2 ) + -^ {b 2 )+2 (ab). 
PA 1 - PA 

Let 

A = &vax(y/n[ATE uruu % - ATE}) - sva.r(^[ATE interact - ATE}). 
Then 

A = — ^-(az)(z , z)- 1 (az) / + — ^— (6z)(z'z)- 1 (6z) / + 2(az)(z'z)- 1 (6z) / 
PA 1 - PA 

= — 77- 7Q' E {z'z)Q E = — lim a% > 0. 

pa{i-pa) pa{i - pa) n ->°° 

The matrix (z'z) is positive definite, so A/n = if and only if = 0. 

A. 5. Proof of remark (iv) after Corollary 1.1. Suppose there are 
three treatment groups, A, B, and C, with associated dummy variables U, 
Vi, and Wi and potential outcomes etj, bi, and Cj. Let ATE = a — 6, and let 
ATEi n t er act be the difference between the estimated coefficients on Ui and 
Vi in the no-intercept OLS regression of Yi on Ui, Vi, Wi, Zj — z, U(zi — z), 
and Wj (zj — z) . 

Assume the three groups are of fixed sizes UA,n E , and n— ua— n E - Assume 
regularity conditions analogous to Conditions 1-3: for example, n^/ra —> pa 
and n E /n — > p E , where pa > 0, pb > 0, and pa+Pb < 1- Without loss of 
generality, assume Condition 4. 
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Then y/n(ATE interact — ATE) converges in distribution to a Gaussian 
random variable with mean and variance 

1 ~ PA v 2 , 1 -PB ,. 2 , o i- 
lim cr * H nm Ci,* + 2 lim <r a * 

PA n ~ PB n— >oo ' 

The proof is essentially the same as that of Theorem 1. 

Let ATE una dj = Y a—Y B - By Freedman's (2008b) Theorem 1, the asymp- 
totic variance of y/n(ATE una< ij — ATE) is 

1 ~ PA i 2 \ . 1 - PB ,,2\ , o/ JA 

(a ) H (6 ) + 2(ab). 

PA PB 

Let 

A = avar (y/n[ATE unadj - ATE]) - sva.r(y/n[ATE interact - ATE]). 
Then 

A = i^i(az)(z'z)- 1 (az) , + ^^(6z)(z'z)- 1 (6z) , + 2(az)(z , z)- 1 (6z) / 
PA PB 

= LllA/ az \/ z i z y±/ az y + -^_(6z>(z / z>- 1 (6z) / + 2(az)(z'z)- 1 (6z) / + 
PA 1 - PA 



I-PB PA 



p B I- PA 



(6z)(z / z)- 1 (6z) / 



1 r 2 , ( l ~PB PA 

hm a E + Q b {z z)Q b , 



p A (l-p A )n^oc \ PB I- PA 

where £7j = (z* - z)Q£ and Qe = (1 - pA)Qa + PAQfe- 
Similarly, 

A = _ i - Hm of. + --**-] 



Ps(l ~Pb) n ^°° V PA 1-PB 

where F» = (z, - z)Q F and Q F = psQ a + (1 - Pb)Q&. 
The condition +PB < 1 implies 

1 ~ PB PA 1-PA PB 

p B I- PA ' PA l-PB 

Also, (z'z) is positive definite. Therefore, A > 0, and the inequality is strict 
unless Q a = and Qb = 0. 

The proof extends to designs with more than three treatment groups. 
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A. 6. Proof of Corollary 1.2. Again, we can assume Condition 4 with- 
out loss of generality. By Lemma 6, 

avar(\/n\ATE n M — ATE]) = — lim cr 2 »» H — — lim cr?** +2 lim a ** <,*» 

= i^[(a 2 ) + Q / (z'z)Q-2Q / (az)'] + 

Pa 

-^[(b 2 ) + Q'(z'z)Q - 2Q'(te)'] + 
1 - Pa 

2[(o6) + Q'(zz)Q - Q'(az)' - Q'(bz}'] 

= ^PA {a 2 ) + ^A_ {b 2 ) + 2{ab) + 
PA I- PA 

* rQ'(z'z)Q - —Q'(az)' - _H_Q'(bz)'. 

Pa(I-Pa) PA 1-PA 

Let 

A = avar(\/n[j4r£' a rfj — ATE]) — av&r(-\/n[ATE interact ~ 

ATE]). 

Then 

a = — - - — -Q'( z 'z)Q-— q'h'-t^- QW + 

p A (l-p A ) p A I -Pa 

— ^-(az)(z'zy 1 (azy + -^—(bz)(z' z)' 1 (bz)' + 2(az) (z ! z)~ l (bz)' 
PA l-PA 

PA 2 + 1^Pa\ h az ^ z > z yi( az y + (6z)( z 'z)- 1 (6z)' - 2(az)(z'z)- 1 (6z)' 



1 - PA PA 
= ^ ^ (Qa ~ Qfr)Vz)(Q Q - Q 6 ) 

(2PA - I) 2 .. 2 . _ 

= — Tj r lim <?d > 0- 

PA{1 - pa) n ^°° 

A. 7. Outline of proof of remark (iii) after Corollary 1.2. With- 
out loss of generality, assume Condition 4. Prom the proof of Theorem 1, 

ATE i nter act = y/n[(a A ~ Z A Qa) ~ (pB ~ ZfiQb)] + O p (l). 

By Condition 4, p^za + (1 — Pa)%b = 0. Therefore, z^ = (1 — pa)(za — zs) 
and zb = —pa^a — %b)- It follows that 

y/nATE 'interact = V^i^A ~b B ~ (z"a ~ Z B )[(1 ~ PA)Qa + PaQ;,]} + O p (l). 
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Now let ATEtyranny and Qtyranny be the estimated coefficients on Tj and 
Zj from a weighted least squares regression of Yi on Tj and Zj, with weights 



pa 



1 - Pa 



It can be shown that Qtyranny — > (1 — Pa)Qo, + PaQb- The proof is similar 
to that of Lemma 4, after noting that weighted least squares is equivalent 
to OLS with all data values (including the constant) multiplied by y/wl. 
It follows that 



VnATE 



tyranny 



\/n{a A -b B - (z A - z B )[(l - Pa)Qo, + PaQo}} + o p (l). 



The proof is similar to arguments in the proofs of Lemmas 2 and 6. 
Therefore, y/n(ATE t yranny - ATE in t e ract) A 0. 

A. 8. Proof of Theorem 2. We can assume Condition 4 without loss 
of generality, by arguments similar to those given in the proofs of Lemmas 
4, 6, and 8. 

By Lemma 8, nv a M = Mu + M22 — 2M±2, where 



M = (n^X'X)- 1 In-^efScfa) (n^X'X)" 1 . 
Using Condition 4, 

7i _1 X'X 

where 



C 

D' z 7 ^ 



PA 
1-PA 



D 



PAZA 
(1 -p A )zB 



By Conditions 2-4 and Lemma 1, £>a — y pa, z A — > 0, z# 
invertible. Therefore, 



0, and (z'z) is 



(n _1 X'X 



-1 ^ 



F 

(z'z)- 1 



where 



Vpa 





1/(1 -PA) 
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Also, 



x,x 



G H 

H' z'z, 



where 



So 



n 



Ti 
1-2} 



i=l 



H 



2}zj 
(1 - 2})zi 



K L 



where 
K = 



""'EiGBe? 



PAn A l E iG A e • 




(l-p A )(n-n A ) 1 E 4 Gi?e 



n 1 EiGA &i Z i 



By Lemma 9 and Condition 3, L and n Y^l=i ^i z i z i are O p (l), and 



K A 



PA limn^oo a\ 







(1 - ^) lim„->.oo o"6* 



The above results imply that the upper-left 2x2 block of M converges 
in probability to 



1/pA 
1/(1 -pa) 



1/pA 
1/(1 -pa) 



PA lim a 2 a „ 

(1 - pa) limn-Kx, of** 

p^ 1 lim^oo 

(1 - pa)" 1 lim n _>oo of, 



Thus, 



nvadj A — lim of,* + — — — lim of**. 



Lemma 6 implies 
avar (y/n[ATE adj - ATE}) 



1-PA ,. 



PA 



lim cj„»» + 



PA 



1 — Pa n ^°° 



lim of,, + 2 lim a a * 
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Let A = plim nv a dj — &\ai{^/n[ATE a< ij — ATE]). Then 



A = lim <j%„ + lim <rL» — 2 lim a a ** i* 

n— >oo n— ¥00 n— >oo ' 

= lim of M _(,*»•) = lim af a _ b) > 0. 



The proof for rivi n t e ract is similar. 
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